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During  the  contract  period;.of  May  1,  1978,  to  December  31,  1980, 
our  research  has  been  concentrated  in  the  following  five  major  areas: 

(1)  Smooth  Nonparametric  Regression  by  Quadratic  Programming f 

(2)  Integer  Programming  and  Optimal  Capacities  and  Locations  for 
Distribution  Centers ,' 

(3)  Project  Scheduling r “Statistical  Treatment  of  PERT  Critical 

P 

Path  Analysis,' 

(4)  Linear  Programming  in  Statistical  Estimation  Problems J  L^ 

„  j.  Estimation,'  and 

(5)  Statistical  Control  of  Nonlinear  Programming }  Confidence 
Limits  for  Global  Optima. 

A  brief  review  of  the  nature  of  Our  research  in  these  areas  as  well  as 
our  accomplishments  is  given  in  sections  1-5  below. 

We  have  extended  the  frontiers  of  research  in  the  interface  of 

r ' 

opeations  research  and  statistics.  In  addition,  our  research  during 

i 

this  contract  period  has  led  to  13  new  technical  reports,  5  professional 
publications,  3  Ph.D.  dissertations,  5  M.S.  degrees,  13  computer  algo¬ 
rithms,  4  presentations  at  national  statistical  meetings,  and 
4  presentations  at  international  operations  research/management  science 


-  * 
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1.  Smooth  Nonparametrlc  Regression  by  Quadratic  Programing 
*  1.1  Technical  Description 

The  most  frequently  used  specification  of  parametric  single  vari- 

t 

ate  regression  is  of  the  form 

yi  "  f ® +  ei  *  i  -  1,  ....  n  ,  (1.1) 

where  the  y  and  the  are,  respectively,  the  observed  responses  and 
inputs,  9  is  a  vector  of  unknown  parameters,  f  is  a  known  function  of 
x  and  9,  and  the  e^  form  a  set  of  n  independent  variables  usually  as¬ 
sumed  to  be  normally  distributed  with  mean  0  and  variance  a2;  i.e., 

e1  *  N (0 ,  a2)  .  (1.2) 

<  There  is  a  considerable  literature  generalizing  (1.2),  while  the  gener¬ 

alization  of  f  to  a  nonparametrlc  form  has  received  considerably  less 
attention. 

We  have  been  concerned  with  the  frequently  occurring  situations 
where  the  parametric  form  of  f  is  unknown.  Specifically,  we  deal  with 
the  model 


y±  -  f(x±)  +e1»  i  -  1,  ....  n 


(1.3) 


where  we  retain  assumption  (1.2)  but  do  not  assume  that  the  functional 
form  of  the  single  variate  regression  function  f(x)  is  known. 


Si 
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Clearly  without  any  assumption  on  f (x^  the  problem  is  both  trivial 
and  unsatisfactory  since  essentially  the  only  inference  we  are  able  to 
draw  is  that  y^  is  an  unbiased  normally  distributed  estimator  of  f(x). 
Although  in  most  applications  the  mathematical  form  of  f(x)  will  not  be 
known,  it  is  often  known  that  f(x)  is  a  "smooth  function".  The  class 
of  smooth  functions  considered  in  our  research  is  the  class  C(k,  M)  of 

.u  j  (k )  /i  \ 

functions  f(x)  for  which  the  k  derivative  — —  f(x)  3  f'  '(x)  is 

dx 

numerically  below  an  upper  bound  M;  so  that, 

|f(k)(x) j  <  M  for  L  <  x  <  U  (1.4) 

where  L  £  x  <  U  is  the  observational  range  of  the  x. 

In  our  view  the  class  of  functions  C(k,  M)  is  of  considerable  im¬ 
portance  for  many  reasons : 

(a)  Using  C(k,  M)  permits  a  parameterization  of  f(x)  and  associ¬ 
ated  maximum  likelihood  estimation  theory. 

Cb)  It  is  possible  to  derive  upper  and  lower  "confidence  curves" 
for  f (x)  within  C (k ,  M) . 

(c)  The  class  C(k,  M)  permits  the  use  of  finite  difference  cal¬ 
culus  and  thereby  a  "bias  control"  in  the  estimation  of  f(x). 

(d)  Insofar  as  practically  all  "higher  mathematical  functions" 
belong  to  a  C(k,  M)  class  for  some  combination  of  k  and  M, 
there  is  a  considerable  probability  that  this  class  will  con¬ 
tain  functions  f(x)  arising  from  blo-physlco-chemlcal  phe¬ 
nomena  unless  f(x)  has  known  singularities. 
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We  introduce  a  finite  difference  calculus  based  approximation  to 
the  class  C(k,  M)  which  permits  its  parameterization.  We  specify  that 
our  estimator  of  f(x)  shall  have  a  bias  that  does  not  exceed  a  pre¬ 
specified  value  e.  If,  for  example,  e  is  taken  to  be  equal  to  the 
measurement  precision  of  the  y^,  our  estimator  is  for  all  practical 
purposes  unbiased.  However,  it  is  quite  feasible  to  specify  larger 
values  of  e.  We  replace  the  target  function  f(x)  by  a  suitably  se¬ 
lected  Lagrangian  interpolation  polynomial.  Specifically,  we  consider 
a  finite  grid  of  p  values  of  x  denoted  by  £  ^ ,  j  «  1,  ....  p;  the  "tab¬ 
ular  values",  ■  f(£j),  associated  with  them;  and  the  Lagrangian 

Interpolation  polynomial 


<Hx,  rj) 


k 

I  v 

j'-l  J 


n  (x  - 

- 


V 


n  (5 

s*r 


y 


-  s8> 


(1.5) 


where  the  £j»  are  the  k  "nearest  neighbors"  of  x. 
choice  of  p  values  of  ,  it  is  possible  from  the 
finite  difference  calculus  to  infer  that 


Then,  for  a  suitable 
remainder  term  of 


jf(x)  -  $(x,  n)|  1  c  .  (1.6) 

Thus,  for  every  function  f(x)  belonging  to  C(k,  M) ,  there  is  an  element 
in  the  family  of  Lagrangian  interpolation  polynomials  $(x,  n)  satisfying 
(1.6). 

The  estimation  of  f(x)  proceeds  in  two  steps,  viz. 

(a)  the  estimation  of  the  tabular  ■  f(?j),  and 

(b)  the  estimation  of  the  f(x)  for  x  4  5 j . 
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The  $(x^,  n)  as  Implied  by  (1.5)  are  linear  functions  of  the 
parameters  ( J  ”  1,  ...»  p) .  Invoking  (1.2)  we  have  the  linear  least 

sq’iares  principle  equivalent  to  the  maximum  likelihood  estimation;  l.e., 

* 

the  maximum  likelihood  estimation  n  of  the  n  is  obtained  from 


min 

n 


(1.7) 


However,  in  order  to  utilize  the  information  fully,  we  Invoke  again 
basic  results  of  finite  difference  calculus.  The  k-th  order  divided 
differences  for  any  subset  (4  ...  4  ...£  £+^)  of  k  +  1  of  the  p  argu¬ 
ments  4  are  defined  by 


tq  q 


q+i* 


k+l 

I  K 

£-1 


I  n  u: 


and  represent  contrasts  of  the  4  related  to  the  k-th  derivative  by  the 
fundamental  equatioa 


4  ...  4+1l  -  f (k)  (C)/k! 


where  (  Is  an  argument  with  V  .  <  Z  <  V  .  The  divided  differences 

min  —  —  max 

are  Invariant  to  permutations  of  the  £'  and  are  usually  computed  by 

x» 

arranging  the  4  in  ascending  order  of  magnitute.  It  is  therefore  a 
consequence  of  (1.4)  that 


1  iq  q 


q+13 


(1.8) 
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for  any  of  the  (^^)  selections  of  a  set  of  k  +  1  arguments  ... 
out  of  the  tabular  arguments  Cj .  We  have  shown  that  the  Inequalities 
(1.8)  need  only  be  set  up  for  the  p-k  selections  of  which  are  k  +  1 

consecutive  arguments.  This  means  we  set  up  (1.8)  for  [£^  ^  S3  ••• 

U2  S3  ...  Sk+21  •••  ^p-k+i  •••  The  reason  for  this  is  that, 

for  any  other  selection  of  £^,  ‘  •  S^+1l  will  lie  within  the  con¬ 

vex  closure  of  the  special  set. 

The  restricted  maximum  likelihood  estimation  is  therefore  achieved 
by  the  quadratic  programming  problem  of  minimizing  S2  given  by  (1.7) 
subject  to  the  linear  inequalities  (1.8).  We  remark  that,  for  M -*•  0, 
the  equations  (1.8)  imply  that  all  $(x^,  n)  are  reducible  to  a  k  -  1 
degree  polynomial  of  the  form 


k-i 

$(x. ,  n)  »  I  *  *. 

1  t-0  1 


so  that  we  have  a  polynomial  least  squares  estimation  as  a  limiting 
case  of  our  general  methodology. 

Having  estimated  the  n j  ■  f(£j)  by  n j ,  the  estimates  f(x)  for 
intermediate  values  of  x  will  be  provided  by  a  "central"  Lagrangian- 
(or  Newton-)  interpolation  formula  of  the  form  $(x,  n)  given  by 


f(x)  =  <fr(x,  n) 


k  „ 

l  V 

j'-l  J 


n  (x  -  (•!) 

afl' 

it  (cJt  -  O 
sVj' 


(1.9) 


where  the  £  ^ ,  are  a  sequence  of  k  of  the  £j  arguments  "nearest"  to  x. 
The  estimators  (1.9)  are  "maximum  likelihood"  estimators  since  they 
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are  functions  of  the  maximum  likelihood  estimators  rij*  However,  they 
are  maximum  likelihood  estimators  of  the  Lagrangian  interpolates  $(x,  n) 
given  by  (1,5)  which  differ  from  the  target  parameters  f(x)  by  less 
Chan  e. 


1.2  Research  Status 

Technical  Report  66  describes  how  noisy  observations  on  a  univariate 
function  f(x)  with  unknown  functional  form  but  with  !f^(x)|  can 
be  used  to  model  f.  A  "table"  of  estimated  values  of  f(x)  is  con¬ 
structed  for  an  evenly  spaced  grid  of  x  values.  If  f(x)  is  the  corres¬ 
ponding  (k  -  l)-th  degree  Lagrangian  interpolating  polynomial,  then  f(x) 
has  bias  £  e.  For  given  e  and  M  the  estimator  f (x)  is  smooth  in  the 
sense  that  it  is  a  piecewise  (k  -  l)-th  degree  polynomial  with  exten- 

a 

slons  of  f(x)  from  one  piece  to  an  adjacent  pieve  agreeing  to  within  e. 
The  consistency  of  f (x)  is  proven  in  Technical  Report  66 ,  and  confidence 

a 

statements  based  upon  f(x)  are  documented.  Some  numerical  experience 
with  our  smooth  nonparametric  regression  procedure  on  both  real  and 
simulated  data  is  reported  in  Appendix  3  of  Technical  Report  66. 

Technical  Report  67  contains  a  documentation  of  a  computer  program, 
NPREG,  implementing  our  smooth  nonparametric  regression  procedure  as 
well  as  a  review  of  alternative  spline  regression  methods.  A  copy  of 
the  computer  program  can  be  obtained  from  Robert  L.  Sielken  Jr. ,  Insti¬ 
tute  of  Statistics,  Texas  A&M  University,  College  Station,  Texas  77843. 

An  abbreviated  version  of  Technical  Report  66  has  been  submitted 
for  publication  in  the  Journal  of  the  American  Statistical  Association. 
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Dr.  H.  0.  Hartley  made  two  presentations  concerning  our  smooth  non- 
parametric  regression  procedure  at  national  statistical  meetings.  The 
most  recent  presentation  was  in  August,  1980,  at  the  joint  national 
meetings  of  the  American  Statistical  Association  and  the  Biometric 
Society. 

Mr.  Nicolas  Bocquet  and  Mr.  Arnaud  Nougues  both  wrote  Master  of 
Science  theses  concerning  smooth  nonparametric  regression  and  were 
supported  by  the  contract. 


\ 

I 
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2.  Integer  Programming  and  Optimal  Capacities  and  Locations  for 


Distribution  Centers 

The  ingredients  in  the  distribution  center  problem  are 

(i)  the  possible  locations  for  distribution  centers, 

(ii)  the  locations  and  needs  of  the  potential  users  of  the 
distribution  centers,  and 

(ill)  the  different  capacities  a  distribution  center  can  have. 

The  problem  is  to  determine 

(a)  the  number  of  distribution  centers  to  create, 

(b)  the  location  and  capacity  of  each  distribution  center 
created,  and 

(c)  how  much  each  potential  user  should  actually  utilize  each 
distribution  center. 

The  objective  is  to  identify  the  minimum  cost  distribution  system. 

Some  examples  of  distribution  center  problems  are  as  follows: 

(i)  The  distribution  centers  could  be  super-tanker  ports  with 
the  users  being  the  U.S.  oil  burning  utility  plants  and  the 
costs  being  construction,  processing,  and  transportation 
costs. 

(ii)  The  distribution  centers  could  be  intermediate  collection  - 
dispersion  points  such  as  cotton  gins  which  receive  cotton 
from  the  individual  farms  (the  users)  and  then  send  the 
processed  cotton  on  to  warehouses  or  mills  so  that  the  costs 
are  construction,  processing,  and  transportation  costs  both 
to  and  from  the  center. 


(iii)  The  distribution  centers  could  be  ambulance,  fire,  or  police 
stations  with  the  users  being  major  need  locations  and  the 
costs  in  terms  of  creation  and  maintenance  costs  as  well  as 
response  time. 

Thus  distribution  centers  can  be  receivers,  senders,  or  both  and  the 
costs  either  monetary  or  in  terms  of  time  or  both.  A  whole  host  of 
problems  both  military  and  civilian  can  be  formulated  as  distribution 
center  problems. 

It  should  also  be  noted  that  the  concept  of  capacities  for  distri¬ 
bution  centers  is  equivalent  to  the  concept  of  a  distribution  center 
with  a  piecewise  linear  cost  function  (possibly  not  continuous) . 

The  determination  of  the  optimal  capacities  and  locations  for 
distribution  centers  is  a  mixed  integer  linear  programming  problem. 

Our  research  has  focused  on  the  following  three  general  procedures 
for  solving  the  distribution  center  problem: 

(A)  Determine  the  centers*  locations  and  capacities  simultaneously. 

(B)  In  Step  1  determine  the  centers'  locations  and  then  in  Step  2 
determine  their  best  capacities  given  the  locations. 

(C)  In  Step  1  determine  the  centers’  capacities  and  then  in 
Step  2  determine  their  best  locations  given  the  capacities. 

The  specific  problems  that  must  be  solved  in  these  procedures  are 
given  below.  The  following  notation  simplifies  the  formulation  of 
these  problems. 

I  -  number  of  users; 

J  ■  number  of  possible  center  locations; 

K  -  number  of  different  center  sizes; 
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■  supply  of  need  corresponding  to  user  1,  1*1,2,...,  I; 

*  capacity  of  a  center  of  size  k,  k  ■  1,2,...,  K; 

*  cost  of  processing  one  unit  of  need  at  a  center  of 
size  k; 

*  fixed  cost  associated  with  a  center  of  size  k; 

*  cost  of  transporting  one  unit  of  need  between  user  i 
and  center  j ; 

■  the  number  of  centers  of  size  k  at  location  j, 

k  —  1,2,...,  Kj  J  *  1,2,...,  J | 

J 

*  E  n  ,  ■  the  total  number  of  centers  of  size  k; 
j-1  J 

K 

*  Z  N,  ■  the  total  number  of  centers; 
k*l 

*  1  if  there  is  a  center  located  at  j ,  0  if  no  center  is 

located  at  j; 

J 

•EL.*  the  total  number  of  locations  with  centers; 
j-1  J 

»  the  number  of  units  of  need  transported  between  user  i 
and  location  j ; 

*  the  number  of  units  of  need  processed  at  location  j 
in  a  center  of  size  k;  and 


Y.  *  E  Y  .  -  the  total  number  of  units  of  need  processed 
k  j-1 

in  the  centers  of  size  k. 

The  constants  I,  J,  K,  S^,  R^,  cfc,  ffc,  and  t^  are  given  and 
represent  characteristics  of  the  problem;  while  n^>  N^>  N,  L^,  L, 
X^,  Y^  and  Y^  represent  the  decision  variables  for  which  values 
are  to  be  determined. 


■k  A  v  ^ 


12 


Procedure  A 


Solve 


I  J 

min  E  Z  t  .X  . 
i-1  j-1  13  13 


J 

Z 


K 


K 


**  ck^jk 
j-1  k-1  K  3* 


J 

+  Z 
j-1  k-1 


1,  £k”jk 


subject  to 
I 

E  X 


i-1 


ij 


K 

E  Y  ,  for  j  -  1, 
k-1  3 


,  J; 


(2.1) 


(2.2) 


^  f  for  j  *  •  •  • »  *J  sud  k  *  li  • » •  >  Kj 


E  X  .  -  S.  ,  for  i  -  1,  ...,  I; 
j-1  3 

X  .  _>  0  ,  Y  .  >_  0  ,  for  all  i,  j,  k;  and 
ij 


n^k  -  0  or  1  ,  for  all  j,  k  . 


(2.3) 

(2.4) 

(2.5) 

(2.6) 


The  objective  function  (2.1)  represents  the  total  cost  consisting 

of  the  transportation  cost,  E  E  t. .  X.  ,  the  processing  cost 

i  j  13  13 

E  E  c,  Y  ,  and  the  fixed  center  cost,  E  E  f,  n  .  Constraint 

j  k  *  3  j  k  3fc 

(2.2)  requires  that  the  amount  of  need  transported  at  a  location, 

EX.  ,  equals  the  amount  of  need  processed  at  that  location,  E  Y 


ij 


k 


Jk' 


In  (2.3)  the  amount  of  need  processed  at  location  j  by  a  center  of 
size  k  is  required  to  not  exceed  the  capacity  of  such  centers  and 
be  zero  if  there  are  no  centers  of  size  k  at  location  j .  Constraint 
(2.4)  requires  that  all  of  the  supply,  S^,  of  need  corresponding  to 
the  i-th  user  be  transported. 

The  solution  to  (2.1)  -  (2.6)  is  a  guaranteed  optimal  solution 
to  the  distribution  center  problem. 
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Procedure  B 

Step  1.  From  among  all  possible  locations,  determlng  the  set 

of  L  center  locations  that  minimizes  total  transportation 
costs.  Do  this  for  L  -  1,  2 . 

For  a  given  value  of  L,  solve 

I  J 

min  £  £  t  X  (2.7) 

i-1  j-1  13  13 

subject  to 

J 

I  X  -  S.  ,  for  i  -  1,  ....  I  ;  (2.8) 

j-1  13  1 

I  I 

E  X  <  L  (  E  S  )  ,  for  j  -  1 . J  ;  (2.9) 

i-1  J  i-1  1 

J 

E  L  -  L  ,  (2.10) 

j-1  3 

Xjj  _>  0  ,  for  all  i,  j;  and  (2.11) 

-  0  or  1  ,  for  j  -  1,  ...»  J  .  (2.12) 

The  purpose  of  the  L^'s  i*1  (2.9)  and  (2.10)  is  to  insure  that 
exactly  L  locations  have  centers  and  that  the  needs  are  only 
transported  at  these  locations. 

Step  2.  For  each  value  of  L  and  its  associated  set  of  optimal  plant 
locations,  determine  the  number  and  sizes  of  plants  at 
each  location  which  minimizes  total  plant  costs. 
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For  each  value  of  L,  the  total  cost  is  simply  the  sum  of  center 
costs  at  each  of  the  locations  where  ■  1.  For  a  particular  loca¬ 
tion  j,  where  ■  1,  the  minimum  center  cost  solution  is  obtained 


from: 


subject  to 


K  K 

min  2  c.  X.  +  l 

k-1  *  *  k-1  3 


(2.13) 


K  I 

z  Y  -  z  x4J  ; 

(2.14) 

k-lJ*  J-l13 

v  -  ’  f°r  k  ■ 1 . Ki 

(2.15) 

Y  ,  >  0  ,  for  k  -  1,  ....  K,  and 

(2.16) 

jk  - 

3k  'O'  l"  2-  . K: 

(2.17) 

where  in  this  problem  the  X^'s  are  constants  whose  values  are 
determined  in  Step  1  for  the  particular  value  of  L. 

Step  3.  For  each  value  of  L,  aggregate  transportation  cost  from 
Step  1  with  plant  costs  from  Step  2.  The  value  of  L 

..  .  .  «  It  _ 
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Procedure  C 

Step  1.  Minimize  the  center  costs  for  alternative  values  of  N. 

For  a  specific  value  of  N  the  optimal  center  size 
configuration  (N^,  N 

K  K 

min  Z  c  Y  +  Z  f  N  (2.18) 

k-1  k-1 


2 i  . . . ,  N^)  is  determined  from: 


subject  to 

K  I 

n  ■  Z  S.  ;  (2.19) 

k-1  K  i-1  1 


Yk  <  RlcNk  ,  for  k  -  1,  ....  K  ;  (2.20) 


K 

Z  N  -  N  ;  (2.21) 

k-1  * 


Yk  >  0  ,  for  k  -  1,  ....  K  ;  (2.22) 

Nk  -  0,  1,  2,  ...  for  k  -  1,  ...,  K  (2.23) 

Constraint  (2.19)  insures  that  all  of  the  needs  are  processed 
somewhere.  In  (2.20)  the  total  amount  of  needs  processed  by 
centers  of  size  k  does  not  exceed  their  combined  capacity. 

Step  2.  Minimize  the  transportation  costs  associated  with  each 
value  of  N  and  the  corresponding  optimal  center  size 
configuration  (N^ ^ ,  ...» 
by  solving  the  following  problem: 


Ng) .  This  is  accomplished 


1 


I  J  J  K 

nln  Z  Z  t..X  +  Z  Z  <lY  (2.24) 

i-1  j-1  13  13  j-1  k-1  *  3 K 

subject  to 

J 

Z  S  -  S.  ,  for  i  -  1,  ....  I  ;  (2.25) 

j-1  13 


I  K 

Z  X  -  Z  Y  .  for  J  -  1,  ....  J  ;  (2.26) 

i-1  13  k-1  3 


Y  1  »  for  J  *  1»  •  •  • »  J  k-1,  . . . ,  K  ;  (2.27) 


J 

Z  n  -  N  ,  for  k-1 . K  ;  (2.28) 

j-1  3 


Xij  •  Yjk-°  ’  f°r  ^  i’  J’  k 


(2.29) 


Jk 


-  0  or  1  ,  for  all  j ,  k 


(2.30) 


When  the  solutions  from  Steps  1  and  2  are  combined  for  a  particular 
N,  the  corresponding  total  of  the  transportation  and  plant  costs  is 


I  J  J  K  J 

Cost  (N)  -  Z  Z  t  X..  +  Z  2  +  c.  Y  .  +  Z 

i-1  j-1  13  3  j-1  k-1  3* 


E  fknnk  (2*31) 
j-1  k-1 


The  value  of  N  which  minimizes  Cost(N)  implies  a  "near  optimal"  solution 


Computer  Implementations  of  all  three  procedures  were  prepared. 
Contact  Dr.  R.  L.  Slelken,  Jr.  for  these  programs. 

While  we  were  preparing  algorithms  to  solve  the  distribution 
center  problem,  it  became  apparent  that  the  algorithm  user  should 
be  able  to  select  his  solution  strategy  by  choosing  from  among 
various  options  for  each  of  several  algorithm  factors.  However, 
associated  with  the  flexibility  of  being  able  to  select  options  is 
the  problem  of  determining  which  combination  of  options  is  "best". 

In  fact,  this  latter  problem  frequently  confronts  algorithm  users 
not  only  in  distribution  center  problems,  but  also  in  the  more 
general  areas  of  mixed  integer  programming,  all  Integer  programming, 
nonlinear  programming,  operations  research,  life,  etc.  We  decided 
to  illustrate  a  general  statistical  approach  to  answering  the 
question  of  which  combination  of  options  is  best.  The  illustration 
was  in  the  context  of  a  new  integer  linear  programming  algorithm 
where  "best"  was  quickest. 

Dr.  William  Riley  prepared  a  new  very  general  all  integer 
programming  algorithm  called  SLIP  (Solves  Linear  Integer  Problems) . 
Several  of  the  best  suggestions  in  the  literature  were  incorporated 
into  a  single  algorithm,  along  with  several  new  ideas.  The  four 
factors  in  SLIP  where  the  user  must  select  an  option  are  (1) 
augmenting  partial  solutions,  (2)  backtracking,  (3)  fathoming 
on  the  basis  of  binary  feasibility  and  optimality  indicators , and 
(4)  use  of  linear  programming  on  the  relaxed  problem  which  Includes 
penalties,  cuts,  surrogate  constraints,  and  associated  fathoming. 

By  proper  choice  of  options  in  SLIP,  the  experimenter  may  configure 
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SLIP  to  Imitate  any  of  over  14,000  possible  algorithms.  The 
experimenter  may  wish  to  evaluate  the  performance  of  only  one 
algorithm  on  a  particular  class  of  problems  or  may  wish  to  compare 
the  performance  of  several  algorithms.  Additionally,  SLIP  may  be 
used  by  an  experimenter  who  contributes  one  or  two  new  methods 
for  solving  integer  linear  programming  problems.  The  experimenter 
can  add  these  new  methods  to  SLIP  and  find  the  combination  of  options 
in  SLIP  which  best  enhances  the  new  methods. 

Dr.  William  Riley's  dissertation  develops  and  illustrates  the 
use  of  statistical  experimental  designs  (especially  fractional 
factorial  designs)  and  analysis-of-variance  techniques  in  determining 
the  average  usefulness  of  algorithm  options  and  combinations  of 
options.  While  this  analysis  cannot  necessarily  identify  the  best 
combination  of  options  for  a  particular  problem,  it  can  make  economi¬ 
cally  possible  the  identification  of  superior  combinations  of  options 
for  broad  classes  of  problems.  Furthermore  the  average  usefulness 
of  options  can  direct  and  focus  research.  We  recommend  the  use  of 
these  general  statistical  techniques  to  those  developing,  testing, 
and  validating  software. 

The  computer  implementation  of  the  new  all  integer  linear 
programming  problem  solver,  SLIP,  is  documented  in  Dr.  Riley's 
dissertation.  The  program  itself  is  available  through  Dr.  R.  L. 
Slelken,  Jr.,  Institute  of  Statistics,  Texas  A&M  Univera ity.  College 
Station,  Texas  77843. 
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Dr.  Sielken  gave  an  Invited  presentation  concerning  SLIP  and 
the  new  statistical  procedures  for  selecting  combinations  of  options 
at  the  Mathematical  Programming  Society's  Committee  on  Algorithms 
(COAL)  Conference  on  Developing,  Testing,  and  Validating  Software 
at  Boulder,  Colorado,  in  January,  1981.  The  proceedings  of  that 
conference  are  to  be  published  and  will  contain  a  paper  jointly 
authored  by  Dr.  Riley  and  Dr.  Sielken. 
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3.  Project  Scheduling:  Statistical  Treatment  of  PERT  Critical 
Path  Analysis 
3.1  Technical  Description 

A  new  project  scheduling  procedure  called  Statistical  PERT  has 
been  developed  at  the  Institute  of  Statistics,  Texas  A&M  University. 
Basically  the  scheduler  minimizes  the  cost  of  satisfying  a  specified 
project  deadline.  Each  project  activity  has  a  random  duration.  The 
distribution  of  such  a  duration  depends  on  the  mean  duration  selected 
by  the  scheduler.  It  costs  more  to  select  a  smaller  mean  duration. 

Hore  specifically,  the  cost  of  an  activity  is  assumed  to  be  convex 
piecewise  linear  function  of  the  activity's  mean  duration.  The  problem 
is  to  determine  the  activity  mean  durations  which  both  minimize  the 
total  project  cost  and  insure  that  the  mean  (or  some  specified  percentile) 
of  the  corresponding  project  completion  time  distribution  is  less  than  or 
equal  to  a  specified  project  deadline.  The  entire  distribution  of  the 
project's  completion  time  under  the  minimum  cost  schedule  is  a  valuable 
by-product . 

The  new  project  scheduling  procedure  allows  the  project  scheduler 
to  specify 

i)  the  precedences  among  the  project's  activities, 

ii)  the  relationship  between  an  activity's  cost  and  its  mean 
duration, 

iii)  the  manner  in  which  an  activity's  actual  duration  varies 
about  its  mean  duration,  and 
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lv)  a  deadline  for  either  the  project's  mean  completion  time 
or  a  prescribed  percentile  of  the  project  completion  time 
distribution. 

In  return  the  project  scheduler  receives 

i)  a  mini muni  cost  project  schedule  which  delineates  each 
activity's  mean  duration  time, 

ii)  an  estimate  of  the  distribution  of  the  project  completion 
time, 

iii)  information  on  the  trade-off  between  the  project's 
minimum  cost  and  its  specified  deadline,  and 

iv)  a  tool  for  monitoring  the  project's  progress  and,  if 
need  be,  rescheduling. 

An  exciting  feature  of  this  new  project  scheduling  procedure  is 
that  is  simultaneously  incorporates  the  desire  to  minimize  the  project 
cost  and  the  realization  that  an  activity's  duration  is  not  necessarily 
a  fixed  quantity  exactly  equal  to  its  prescribed  duration  but  rather  a 
random  quantity  varying  about  a  prescribed  duration.  No  longer  must 
the  project  scheduler  either  (i)  choose  a  reasonable  cost  schedule  which 
heurlstlcally  hedges  against  the  randomness  in  the  activities  he  guesses 
will  be  critical,  or  (il)  choose  a  reasonable  schedule  which  should 
probably  finish  before  the  deadline  and  then  guess  where  he  can  save 
money  without  disturbing  the  suspected  completion  time  too  much.  By 
considering  both  cost  and  randomness  together  in  one  systematic  algorithm, 
the  new  project  scheduling  procedure  eliminates  this  guesswork. 
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3.2  Research  Status 

The  culmination  of  many  years  of  research  on  project  scheduling 
optimization  occurred  during  this  contract  period. 

The  following  17  technical  reports  relate  to  our  research  on 
project  scheduling: 

No.  2;  "Optimum  Time  Compression  in  Project  Scheduling" ,  L.  R. 
Lamberson  and  R.  R.  Hocking,  May  1968 

No.  46;  "Applications  of  Graph  Theory  to  PERT  Critical  Path  Analysis", 
E.  Arseven  and  H.  0.  Hartley,  September  1974 

No.  48;  "Statistical  Critical  Path  Analysis  in  Acyclic  Stochastic 
Networks:  Statistical  PERT",  R.  L.  Sielken,  Jr.,  L.  J.  Ringer, 

H.  0,  Hartley,  and  E.  Arseven,  November  1974 

No.  50;  "Statistical  PERT:  Decomposing  a  Project  Network:,  R.  L. 
Sielken,  Jr.  and  Norman  E.  Fisher,  January  1976 

No.  51;  "Statistical  PERT:  An  Improved  Subnetwork  Analysis  Procedure", 
R.  L.  Sielken,  Jr.,  H.  0.  Hartley  and  R.  K.  Spoeri,  January  1976 

No,  52;  "Incorporating  Project  Cost  Considerations  Into  Stochastic 
PERT",  Paul  P.  Biemer  and  R.  L.  Sielken,  Jr.,  November  1975 

No.  53;  "A  Statistical  Procedure  for  Optimization  of  PERT  Network 

Scheduling  Systems",  R.  K.  Spoeri,  L.  J.  Ringer  and  R.  L.  Sielken, 
Jr.,  April  1976 

No.  55;  "Statistical  PERT:  An  Improved  Project  Scheduling  Algorith", 

C.  S.  Dunn  and  R.  L.  Sielken,  Jr.,  February  1977 

No.  56;  "A  New  Statistical  Approach  to  Project  Scheduling",  R.  L. 
Sielken,  Jr.  and  H.  0.  Hartley,  December  1977 

No.  57;  "A  User's  Guide  to  the  Computer  Implementation  of  the  New 

Project  Scheduling  Procedure:  Statistical  PERT",  Thomas  C.  Baker, 
Jr.  and  R.  L.  Sielken,  Jr.,  August  1978 

No.  58;  "Statistical  PERT:  Improvements  in  the  Determination  of  the 
Project  Completion  Time  Distribution",  Thomas  C.  Baker,  Jr.  and 
R.  L.  Sielken,  Jr.,  August  1978 

No.  59;  "Multivariate  Edgeworth  and  Gram-Charlier  Expansions  and  their 
Applications  to  Statistical  PERT",  Christian  C.  Robieux,  H.  0. 
Hartley,  Frederic  C.  Lam,  R.  L.  Sielken,  Jr.,  September  1980 
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No.  60;  "Statistical  PERT:  The  Precision  of  the  Estimated  Project 
Completion  Time  Distribution",  Christian  C.  Robieux,  H.  0. 
Hartley,  Frederic  C.  Lam,  R.  L.  Sielken,  Jr.,  September  1980 

No.  61;  "Project  Scheduling  with  Discontinuous  Piecewise  Convex 
Activity  Cost  Functions",  Christian  C.  Robieux  and  Robert  L. 
Sielken,  Jr.,  September  1978 

No.  63;  "Evaluation  of  Precedence  Criteria  and  Project  Scheduling 

under  Resource  Constraints",  Shi  Min  Chang  and  Robert  L.  Sielken, 
Jr.,  May  1980 

No.  68;  "Estimating  the  Distribution  Function  of  a  Transformed 
Random  Vector" ,  T.  C.  Baker,  Jr.  and  R.  L.  Sielken,  Jr., 

September  1980 

No.  69;  "Estimation  of  a  Distribution  Function  by  Extrapolating 
Upper  and  Lower  Bounds",  T.  C.  Baker,  Jr.  and  R.  L.  Sielken, 

Jr.,  September  1980 


Technical  Report  No.  56,  "A  New  Statistical  Approach  to  Project 
Scheduling,"  provides  an  overview  of  the  new  project  scheduling 
procedure  and  the  8  technical  reports  on  which  it  is  based  as  well 


i)  a  thorough  explanation  of  the  project  scheduling  problem 
itself  and  the  need  for  the  new  procedure, 

ii)  a  non-technical  description  of  the  five  general  steps  in 
the  new  iterative  project  scheduling  algorithm,  and 

iii)  an  illustrative  numerical  example  of  the  procedure's 
performance. 

A  similar  overview  was  published  in  Decision  Information,  C.  P. 
Tsokos  and  R.  M.  Thrall  (editors).  Academic  Press,  1979.  An  earlier 
introduction  to  Statistical  PERT  was  published  in  SCIMA,  Vol.  6, 

No.  3,  1977. 
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In  order  Co  make  Che  new  projecc  scheduling  algor IChm  easy  Co 
use,  Che  nine  basic  compuCer  programs  have  been  linked  CogeCher  inCo 
one  auComaCed  sysCem.  A  user  now  is  only  required  Co  inpuC  one 
descripcion  of  his  problem  and  Che  compucer  sysCem  provides 

i)  a  documencacion  of  Che  problem  descripcion, 

ii)  Che  individual  scheduled  acCiviCy  mean  duracions  which  boch 
minimize  Che  cocal  projecC  cose  and  insure  chaC  Che  mean  of 
Che  projecc  complecion  Cime  disCribuCion  (or  some  specified 
percenCile  Chereof)  is  less  Chan  or  equal  Co  Che  specified 
projecc  deadline,  and 

iii)  an  escimace  of  Che  projecc  complecion  cime  disCribuCion 
corresponding  co  Che  minimum  cose  schedule,  as  well  as 

iv)  informaCion  on  Che  Crade-off  beCween  Che  projecc’ s  minimum 
cose  and  ics  specified  deadline. 

Technical  Report  No.  56  also  describes  how  the  system  can  be  used  to 
monitor  a  project  in  progress.  A  technical  documentation  of  the 
computer  system  and  the  user  instructions  is  given  in  Technical 
Report  No.  57,  "A  User's  Guide  to  the  Computer  Implementation  of  the 
New  Project  Scheduling  Procedure:  Statistical  PERT". 

Several  new  analytical  techniques  have  been  incorporated  into  the 
project  scheduling  algorithm.  In  particular  Technical  Reports  No.  53 
and  58  document  a  new  method  of  estimating  the  project  completion  time 
distribution  which  is  more  amenable  to  very  large  projects  and  provides 
increased  flexibility  in  describing  individual  activity  duration  distri¬ 
butions  which  are  non- symmetric.  Conversations  with  Dr.  Thrall  at  Rice 
University  and  others  have  indicated  that  such  skewed  activity  duration 
distributions  occur  quite  frequently  in  practice  and  that  the  ability 
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to  Incorporate  such  distributions  into  our  system  is  a  definite 
advantage. 

Another  significant  improvement  is  the  development  and  imple¬ 
mentation  of  a  new  procedure  for  finding  a  minimum  cost  schedule 
when  each  activity's  duration  is  exactly  its  mean  duration.  This 
deterministic  scheduling  is  now  done  using  a  highly  efficient  network- 
flow  algorithm  which  is  a  generalization  of  a  procedure  described  by 
Fulkerson.  The  new  deterministic  scheduler  allows  the  project  scheduling 
algorithm  to  schedule  projects  with  as  many  as  3000  activities  whereas 
the  old  practical  limit  was  around  100.  In  addition  the  new  deterministic 
scheduler  allows  an  activity's  cost  to  be  a  convex  piecewise  linear 
function  of  time  which  is  quite  realistic.  Furthermore  the  new  deter¬ 
ministic  scheduler  provides  a  project  cost  curve  which  depicts  the 
total  deterministic  project  cost  as  a  function  of  the  specified  deter¬ 
ministic  project  deadline.  This  new  deterministic  scheduler  is  docu¬ 
mented  in  Technical  Report  No.  55. 

Technical  Report  No.  61  shows  how  the  new  deterministic  project 
scheduler  can  also  be  used  to  treat  discontinuous  activity  cost 
functions  which  can  represent  situations  where  alternative  ways  of 
performing  an  activity  have  different  fixed  or  overhead  costs. 

When  a  project  contains  roughly  a  thousand  or  more  activities, 
it  is  sometimes  economical  to  incorporate  sampling  into  the  estima¬ 
tion  of  the  project  completion  time  distribution.  One  aspect  of 
this  sampling  has  suggested  several  alternatives  to  the  empirical 
distribution  function  which  would  be  the  "standard"  estimate.  The 
effectiveness  of  these  alternatives  in  the  project  scheduling  algo¬ 
rithm  was  investigated  in  a  Monte  Carlo  study.  The  alternative 


estimates  of  the  project  completion  time  distribution  and  the  Monte 
Carlo  study  are  discussed  in  Technical  Report  No.  58.  The  best  of 
the  alternatives  has  been  incorporated  into  Statistical  PERT.  Since 
the  project  completion  time  can  be  considered  as  a  transformation  of 
the  individual  activity  durations,  our  research  findings  could  be 
generalized  to  the  broader  problem  of  estimating  the  distribution 
function  of  a  transformed  random  variable.  Technical  Report  No.  68 
describes  this  generalization  and  was  published  in  Communications  in 
Statistics  in  1980. 

A  second  aspect  of  the  sampling  involves  a  sequence  of  distri¬ 
butions  which  converge  monotomically  to  the  project  completion  time 
distribution  from  above  and  a  related  sequence  which  converges  mono- 
tonically  from  below.  The  estimation  of  the  project  completion  time 
distribution  on  the  basis  of  the  first  few  elements  in  the  sequences 
has  also  been  examined.  The  documentation  of  this  examination  and 
its  impact  on  Statistical  PERT  is  included  in  Technical  Report  No.  58. 
The  general  problem  of  estimating  a  distribution  function  by  extra¬ 
polating  a  sequence  of  its  upper  and  lower  bounds  is  discussed  in 
Technical  Report  No.  69  which  was  also  published  in  Communications  in 
Statistics  in  1980. 

To  assess  the  precision  of  an  estimated  project  completion  time 
distribution  four  approaches  have  been  used.  If  the  activities  in  a 
project  occur  only  as  a  mixture  of  the  conaon  activity  configurations 
(series,  parallel,  Wheatstone  bridge,  double  Wheatstone  bridge,  or 
criss-cross  as  described  in  Technical  Reports  No.  48  or  No.  56),  then 
an  exact  project  completion  time  distribution  is  determined  in  the 
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so-called  Simplification  Step  of  the  project  scheduling  algorithm. 

Any  imprecision  in  the  determination  of  the  project  completion  time 
distribution  occurs  after  this  step  if  there  are  any  uncommon  activity 
configurations.  Thus  the  first  means  of  assessing  this  precision  is 
to  force  the  project  scheduling  algorithm  to  skip  the  Sinpliflcation 
Step  and  approximate  the  project  completion  time  distribution  for  a 
project  with  a  known  completion  time  distribution.  The  second  approach 
consists  of  a  Monte  Carlo  simulation  of  relatively  small  projects  which 
are  economical  to  simulate  and  then  compare  the  simulated  project 
completion  time  distributions  with  those  obtained  from  our  project  . 
scheduling  algorithm.  The  third  approach  is  to  note  that  the  project 
completion  time  is  the  maximum  of  several  dependent  sums  of  random 
variables.  The  imprecision  in  the  project  scheduling  algorithm  arises 
when  the  random  variables  in  the  sums  are  replaced  by  approximating 
two-point  discrete  random  variables.  These  approximating  discrete 
random  variables  have  the  same  mean,  variance,  and  third  moment  as 
the  random  variables  they  approximate.  Therefore  probability  inequalities 
based  on  the  first  three  moments  can  be  used  to  assess  the  precision  in 
the  estimated  project  completion  time  distribution.  This  assessment 
could  be  carried  out  using  the  probability  inequalities  known  as  Boole's 

Inequality,  Bonferroni's  Inequality,  etc.  However,  these  inequalities 
are  for  general  dependent  variables  and  not  specifically  for  the  type 
of  dependency  arising  in  project  completion  times.  The  fourth  approach 
to  assessing  the  precision  in  the  estimated  project  completion  time 
distribution  is  to  consider  the  project  completion  time  distribution 
in  its  Edgeworth  or  Gram-Charller  type  series  expansions.  Since  the 
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approximating  discrete  random  variables  have  the  same  first  three 
moments  as  the  random  variables  they  approximate,  the  first  few 
terms  of  the  series  expansions  of  the  estimated  and  actual  project 
completion  time  distributions  agree;  so  that,  bounds  on  the  remaining 
terms  in  the  expansions  provide  an  assessment  of  the  precision  of  the 
estimate.  This  fourth  approach  has  been  documented  in  Technical 
Report  No.  59. 

When  more  than  one  activity  in  a  project  requires  the  same 
Indivisible  resource,  the  project  schedule  can  resolve  this  resource 
constraint  by  specifying  the  order  in  which  these  activities  are  to 
be  performed.  Several  heuristic  criteria  for  determining  this  order 
have  been  considered  in  Technical  Report  No.  63.  The  minimum  cost 
schedule  for  a  given  project  deadline  can  often  be  found  by  deter¬ 
mining  the  optimal  schedule  Ignoring  the  resource  constraints  and  then 
resolving  any  resource  usage  conflicts  by  ordering  the  conflicting 
activities  so  as  to  minimize  the  project  completion  time.  A  simple 
procedure  for  determining  this  latter  order  is  also  given  in  Technical 
Report  No.  63. 

Apart  from  the  17  technical  reports  and  4  published  papers  on 
project  scheduling,  there  have  been  several  presentations  at  meetings 
of  both  statistical  and  operations  research  societies.  In  addition, 

P.  P.  Biemer,  S.  M.  Chang,  P.  Chou,  C.  S.  Dunn,  and  N.  E.  Fisher  wrote 
Master  of  Science  papers  concerning  project  scheduling.  E.  Arseven, 

T.  C.  Baker,  Jr.,  and  R.  K.  Spoerl  all  three  wrote  Ph.D.  dissertations 
on  project  scheduling. 
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The  computer  Implementations  of  Statistical  PERT  and  the 
deterministic  project  scheduler  can  be  obtained  from  Dr.  R.  L. 
Sielken,  Jr.,  Institute  of  Statistics,  Texas  A&M  University,  College 
Station,  Texas  77843.  GTE,  Standard  Oil,  and  NASA  have  all  taken 
advantage  of  this  offer  as  well  as  several  universities  and 
individual  researchers. 


4.  Linear  Programming  In  Statistical  Estimation  Problems: 


Estimation 

Consider  the  linear  regression  model  in  the  form 

y  «  Xg  +  e  . 

where  y  is  a  vector  of  n  observations,  X  is  an  n  x  p  matrix  of  rank 
p  of  known  constants,  6  is  a  vector  of  p  unknown  parameters  and  e  is 
a  vector  of  independent  random  variables  (noise)  symmetrically  distri¬ 
buted  with  mean  zero  and  variance  a2.  The  estimation  of  g  can  be 
obtained  under  several  different  optimality  criteria.  For  g  unrestricted, 
the  classical  least  squares  estimator, g,  where 

g  -  (XTX)_1XTY  , 

has  the  smallest  variance  among  the  class  of  unbiased  linear  functions 
of  y.  However,  the  least  squares  estimator  is  extremely  sensitive  to 
large  values  of  (e|  ,  outliers,  particularly  when  the  sample  size,  n, 
is  small  relative  to  p,  say  n  <_  2(p+l).  In  addition,  the  least  squares 
estimator  does  not  have  the  flexibility  of  allowing  restrictions  to  be 
placed  on  g  .  These  two  drawbacks  suggest  that  an  optimality  criteria 
other  than  least  squares  be  considered.  Several  authors  have  suggested 
that 


n 

S 

i-1 


should  be  minimized  with  respect  to  g  where  y^  is  the  1-th  observation 
and  XA  is  the  i-th  row  of  X.  The  estimator,  g,  which  minimizes  the  sum 
of  the  absolute  residuals  is  often  called  the  estimator. 
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Linear  programming  algorithms  can  calculate  an  estimate.  In 
fact,  they  can  even  Incorporate  several  (say,  m)  added  linear  constraints 
on  8  of  the  form  A8  ^  b  or  A6  >  b  or  AB  ■  b  where  A  is  an  m  x  p  matrix 
of  known  coefficients  for  8  and  b  is  a  known  vector  of  m  constants. 

In  Technical  Report  No.  41  we  have  shown  how  to  obtain  an  unbiased 
unrestricted  estimator  using  any  convential  linear  programming 
algorithm  and  an  Initial  unbiased  antisymmetrlcal  estimator  of  8,  say 
Bq,  where 

B  -  B0(e)  -  -  [8  -  B  (-e)l  • 

The  computer  program  MRS.  A  (Minimizes  the  Sum  of  the  Absolute 
Residuals)  uses  the  technique  described  in  Technical  Report  No.  41 
to  generate  an  unbiased  unrestricted  estimator.  The  computer 
program  MR.  A  (Minimizes  the  Absolute  Residuals)  follows  the  analogous 
procedure  when  8  is  restricted  by  linear  constraints. 

It  is  nice  to  be  able  to  compute  an  estimate.  Furthermore  it 
is  nice  that  the  estimator  is  guaranteed  to  be  unbiased  in  the 
unrestricted  case.  However,  the  nicest  feature  of  both  MRS.  A  and 
MR.  A  is  their  ability  to  also  estimate  the  covariance  matrix  of  the 
estimator,  8.  Such  an  estimated  covariance  would  usually  be  a 
prerequisite  for  making  confidence  Intervals  or  hypothesis  tests 
concerning  8  .  The  ability  to  estimate  the  covariance  of  the 
estimator  sets  MRS.  A  and  MR.  A  apart  from  other  estimation 
procedures . 
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The  computer  programs  MRS.  A  and  MR.  A  are  documented  and 
illustrated  in  Technical  Reports  No.  64  and  65  respectively.  The 
mini-Monte  Carlo  estimators  of  the  covariance  of  6  are  also  described 
therein.  Both  programs  can  be  obtained  from  Dr.  R.  L.  Sielken,  Jr., 
Institute  of  Statistics,  Texas  A&M  University,  College  Station, 

Texas  77843. 


"T-JTj 
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5.  Statistical  Control  of  Noalinear  Programming:  Confidence  Limits 
for  Global  Optima 
5.1  Technical  Description 

Mathematical  programming  problems  have  the  general  form  of  maxi¬ 
mizing  a  function  g(x)  with  respect  to  a  vector  x  which  is  restricted  to 
be  in  some  feasible  region  R.  Let 

G*  •  max  g(x)  . 
xeR 

Unfortunately,  unless  g  and  R  are  extremely  well-behaved,  it  is  quite 
often  impractical  to  find  an  x  e  R  with  g(x)  -  G*.  Hence  the  literature 
abounds  with  heuristic  procedures  claiming  to  provide  "good"  or  "near- 
optimal"  solutions  for  special  cases  of  g  and  R.  A  difficulty  with 
using  these  procedures  in  specific  problems  is  that  it  is  hard  to 
determine  how  near  such  an  approximate  solution,  say  x,  is  to  being 
optimal;  in  particular,  how  near  g(x)  is  to  G*. 

Our  research  objective  was  to  indicate  some  relatively  new  sta¬ 
tistical  procedures  for  obtaining  an  upper  confidence  limit  on  G*. 

Each  of  these  procedures  results  in  a  statement  that  with  a  chosen  con¬ 
fidence  (say  a  95Z  confidence)  G*  does  not  exceed  the  computed  confi¬ 
dence  limit  G*;  that  is 


P(G*  <  G*)  -  .95  . 


These  confidence  limit  procedures  do  not  Indicate  a  feasible  x  with 
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g(x)  ■  G*  or  even  necessarily  produce  a  feasible  x  with  g(x)  near  G*. 
Therefore,  such  procedures  are  not  Intended  as  replacements  for  "sol¬ 
ution"  finding  algorithms,  but  are  intended  as  supplements  to  such 

A  A 

algorithms  in  the  sense  that  the  difference  G*  -  g(x)  provides  a  needed 
Indication  of  just  how  "near"  the  "near-optimal"  solution,  x,  is  to 
being  optimal. 

The  following  situation  illustrates  one  type  of  setting  in  which 
a  confidence  limit  procedure  can  be  used.  A  steepest  ascent  pro¬ 
cedure  is  used  to  find  the  x  e  R  which  maximizes  g.  The  procedure 
requires  a  feasible  x  as  a  starting  point.  Hence  n  starting  points 
x^,  ...,  xq  are  selected  at  random  from  R.  Then  the  steepest  ascent 
procedure  is  carried  out  beginning  at  each  starting  point.  The  n  corre- 

A  A 

sponding  "near-optimal"  solutions  x^ ,  . . . ,  xfl  provide  an  independent 

A  A 

sample  of  G  values;  G^^  »  g(x^),  ...,  Gn(xQ)*  Let  the  ordered  values 
of  this  independent  sample  of  G  values  be  denoted  by  G^,  •••»  G(n) 
where  G^  £  ...  £ G(a)*  Then  the  confidence  limit  procedure  would 
use  the  sample  size  n  and  the  ordered  values  G^ . ^(n)  to  Pro<*uce 

A 

an  approximate  95Z  upper  confidence  limit  G*  on  the  unknown  G*.  The 

A  A 

G*  indicates  roughly  where  G*  is  and  the  difference  G*  -  G^  indi¬ 
cates  how  near  G^  is  to  G*.  The  chosen  confidence  level,  say  95%, 
means  that  approximately  95%  of  the  time,  when  an  independent  sample 

A 

of  G  values  G^,  ...,  Gq  are  observed  and  G*  computed. 


%)  £  G*  1  G*  • 
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Integer  programming  problems  are  another  setting  In  which  a  con¬ 
fidence  limit  procedure  can  be  used.  In  these  problems  the  solution 
strategy  is  usually  to  Implicitly  enumerate  the  feasible  integer  sol¬ 
utions  using  a  b ranch -and-bound  scheme.  In  such  a  scheme  the  impor¬ 
tant  task  is  to  determine  whether  for  say  fixed  values  of  x^,  ...,  x^ 

there  exist  feasible  x  ,  ...,  x  such  that  the  objective  function 

p+i  m 

g(x, ,  ....  x  )  exceeds  the  best  value  of  g  found  so  far.  If  no  such 
1  m 

completion  x  ,  ....  x  exists,  then  all  solutions  with  these  fixed 
p-ri  m 

val  »s  of  x^,  ....  Xp  have  been  effectively  enumerated.  An  approximate 
way  of  determining  whether  any  such  completion  Xp+^»  •••»  xm  of  a 
partial  solution  x^,  ....  x^  exists  is  to  randomly  select  n  feasible 
completions,  calculate  g  for  each  completion,  and  determine  a  95%  up- 

A 

per  confidence  limit  G*  on  the  maximum  value  G*  of  g  with  the  given 

*  A 

fixed  values  of  x, ,  . . . ,  x  .  If  G*  doesn't  exceed  the  best  value  of 
1  P 

g  bound  so  far,  then  all  solutions  with  these  fixed  values  of  x^,  ...,  x^ 
are  considered  to  have  been  enumerated.  Thus,  since  a  relatively  small 
sample  of  completions  can  be  evaluated  much  faster  than  an  extensive 
enumeration  of  the  completions,  a  confidence  limit  procedure  can  greatly 
facilitate  the  solution  of  integer  programming  problems. 

A  A 

Let  x  be  the  estimated  maximizer  of  g  for  a  procedure.  Then  x  is 
a  random  variable  and  so  is 


G  -  g(x)  . 

Denote  the  distribution  function  of  G  by 
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Pg(G)  -  P (G  <  G)  ; 

then 

Pq(G*)  -  1  . 


If,  In  fact,  g(x)  can  equal  the  G* ,  then  G*  is  the  maximum  of  the  ran¬ 
dom  variable  G.  The  problem  is  to  place  an  upper  confidence  limit  on 
G*  given  the  ordered  values  of  an  independent  sample  of  G  values.  Let 
this  ordered  sample  be  denoted  by  G^,  •••»  G^nj  where  G^  <_  ...  <_  G^. 

One  approach  is  to  base  the  upper  confidence  limit  for  G*  on  the 
limiting  distribution  of  the  largest  order  statistics.  Robson  and 
Whitlock  (1964)  and  Boender  et  al.  (1980)  base  their  confidence  limit 
procedures  on  the  two  largest  order  statistics  80(1  G(a)‘  Van 

Der  Watt  (1980)  used  the  k-th  largest,  G(n_k) »  aad  the  largest  order 
statistic,  G(aj ♦  determining  a  confidence  limit  on  G*. 

A  second  approach  is  to  base  the  upper  confidence  limit  for  G* 
on  the  limiting  distribution  of  the  largest  order  statistic,  G(ay 
Clough  (1969)  determines  his  confidence  limit  for  G*  assuming  that  the 
G's  themselves  implicitly  act  like  the  largest  order  statistics  of 
Independent  samples  having  a  limiting  distribution  of  the  exponential 
form.  Golden  and  Alt  (1979)  generalize  Clough's  approach  by  assuming 
that  the  limiting  distribution  is  of  the  Welbull  form.  Lardinois  (1981) 
suggests  a  modification  to  the  procedure  of  Golden  and  Alt.  Mann, 

Schafer,  and  Slngpurvalla  (1974)  suggest  a  procedure  that  also  assumes 
a  limiting  distribution  of  the  Welbull  form. 


The  third  approach  which  has  been  developed  entirely  at  Texas  A&M 
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University  bases  the  confidence  limit  on  G*  on  goodness-of-fit  sta¬ 
tistics.  Several  procedures  employing  this  approach  have  been  deve¬ 
loped.  All  of  these  procedures  determine  the  confidence  limit  G*  on 
G*  in  the  same  basic  vay. 

(1)  Begin  with  n  independent  and  identically  distributed  sample 

values  of  G  denoted  by  G  ,  G_  ,  ...»  G  .  Denote  their  ordered 

1  ^  n 

values  by  G,..,.  <  G...  <  ...  <  G,  .. 

(1)  -  (2)  —  -  (n) 

(2)  Assume  that  PG(G)  ■  P(G  £  G)  is  well  approximated  in  the 

range  G^  £  G  £  G*  by  a  specified  functional  form  F(G|G*,  a) 

where  a  is  a  vector  (a, ,  . ..,  a  )  of  unknown  parameters. 

X  m 

(3)  Specify  a  goodness-of-fit  statistic  Q  to  measure  the  goodness 
of  the  fit  of  F(*|g,  a)  to  G(n_k+;u  >  ....  G^  and  specify, 
k,  the  number  of  largest  order  statistics  to  be  considered. 

A  A  A 

(4)  For  a  given  estimate  G  of  G*,  calculate  estimates  a.  .  a  . 

1  m 

A  A 

(5)  If  the  estimated  distribution  function  F(*|g,  a)  is  a  "good 
fit"  on  the  basis  of  Q  to  the  observed  G(n_]c+^)  »  •••*  ^(Q)  * 
then  increase  the  estimate  G  of  G*  and  repeat  (4). 

(6)  The  upper  confidence  limit  G*  on  G*  is  the  largest  G  for  which 
a  "good  fit"  is  obtained. 

There  are  essentially  four  factors  which  can  be  varied  in  these  proce¬ 
dures  : 

(i)  the  goodness-of-fit  measure  Q, 

(li)  the  functional  form  of  the  approximating  distribution  func- 

A 

tion  F(*  j G ,  a) , 

(ill)  the  method  of  estimating  a,  and 
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(iv)  the  number,  k,  of  largest  order  statistics  used. 

The  three  earliest  procedures  used  quadratic  goodness-of-fit  measures 
Q  and  were  suggested  by  Hartley  and  Pfaffenberger  (1969  and  (1971)  and 
Liau,  Hartley  and  Sielken  (1973).  More  recent  research  has  focused  on 
a  wide  range  of  alternative  possibilities  for  the  factors  (i)  -  (iv) . 

Our  major  research  accomplishments  are  as  follows : 

(1)  A  modification  to  the  confidence  limit  procedure  reported 

by  Hartley  and  Pfaffenberger  (1971)  has  been  suggested,  docu¬ 
mented  and  shown  to  increase  that  procedure's  effectiveness. 

(2)  A  computer  implementation,  GLOBAL,  of  the  three  confidence 
limit  procedures  based  on  quadratic  goodness-of-fit  sta¬ 
tistics  has  been  prepared.  A  separate  shorter  version  of  the 
best  of  these  three  procedures  has  also  been  prepared.  During 
the  course  of  this  preparation  a  general  quadratic  program¬ 
ming  algorithm  was  also  implemented. 

(3)  A  new  family  of  confidence  limit  procedures  based  on  the 
Cramer-Von  Mises  type  goodness-of-fit  statistic  has  been 
created. 

(4)  A  Monte  Carlo  comparison  of  the  confidence  limit  procedures 
has  been  made.  This  study  includes  the  procedures  found  in 
the  literature  as  well  as  several  new  procedures. 

(5)  The  goodness-of-fit  approach  has  been  broadened  greatly  and 
opened  up  for  future  research. 

5.2  Research  Status 

Technical  Reports  18,  30,  32,  35,  40,  43,  and  47  served  as  a  spring- 
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board  for  our  current  research.  The  original  three  confidence  limit 
procedures  based  upon  quadratic  goodness-of-fit  statistics  are  thor¬ 
oughly  documented  in  Technical  Report  62.  The  computer  implementation, 
GLOBAL,  of  these  three  procedures  is  also  documented  in  Technical  Report 
62  as  is  a  small  Monte  Carlo  comparison  of  these  three  procedures.  The 
forthcoming  Ph.D.  dissertation  by  Mr.  Howard  Monroe  will  document  our 
recent  research  on  the  use  of  generalized  goodness-of-fit  statistics 
to  establish  confidence  limits  on  global  optima.  The  Monte  Carlo  com¬ 
parison  of  all  currently  known  confidence  limit  procedures  for  global 
optima  is  also  documented  in  that  dissertation. 

Our  research  in  this  attention  has  received  considerable  attention 
at  the  International  meetings  of  the  Operations  Research  Society  of 
America,  the  Institute  of  Management  Sciences,  and  the  Canadian  Oper¬ 
ations  Research  Society.  Dr.  R.  L.  Slelken,  Jr.  presented  our  initial 
findings  at  the  1979  meeting  and  was  subsequently  invited  to  make  pre¬ 
sentations  in  this  area  at  the  1980,  1981,  and  1982  meetings. 


